This project addresses the ongoing opioid crisis in the of City of Mesa, Arizona by attempting to alleviate the existing strain on the city’s limited public health resources. To do so, our team is proposing to develop ‘FleetPulse’, an app designed to support a dedicated Overdose Prevention Fleet with improved allocation of drug prevention resources and needed medical assistance. Using data extracted from the City’s Open Data portal, the app would rely on a geospatial risk model to predict risk areas for heroin overdoses as a function of various environmental factors such as incidents of crime, distance to parks and metro stations, and localization of outgoing EMS calls. By using a predictive model, emergency and drug prevention support teams will have the means to become more proactive, and less reactive, in addressing cases of heroin overdoses, saving time, money, and importantly, lives.
The following analysis presents in further detail the process of developing the ‘FleetPulse’ model. In a first instance, by using different maps and visualizations tools this report investigates the concerning relationships between the spatial features previously mentioned and areas of high risk of overdose across Mesa. The second half will showcase the development of the risk prediction model and discuss its utility by presenting a variety of cross-validation results; in other words by testing the model’s levels of accuracy and generalizability across geographic units. To conclude, this study will share recommendations of potential next steps for the project and how best to continue to refine the ‘FleetPulse’ model and app to better serve the EMS fleets and community of Mesa more broadly.
Apps like FleetPulse and the research associated with it can be used to inform future studies on how machine learning can be helpful in predicting healthcare emergencies, streamline resources allocation and alleviate fatalities. To find out more about FleetPulse and the full range of its functionality and potential, click here to be redirected to our video pitch.
#knitr::opts_chunk$set(echo = TRUE, warning=FALSE)
library(tidyverse)
library(sf)
library(RSocrata)
library(viridis)
library(spatstat)
library(raster)
library(spdep)
library(FNN)
library(grid)
library(gridExtra)
library(RColorBrewer)
library(knitr)
library(kableExtra)
library(tidycensus)
library(classInt)
library(kableExtra)
library(RColorBrewer)
library(hrbrthemes)
library(ggthemes)
library(ggplot2)
library(plotly)
library(scales)
library(extrafont)
library(basetheme)
library(RColorBrewer)
library(lemon)
library(data.table)
library(GGally)
require(gridExtra)
library(ggalt)
library(rgeos)
library(sp)
library(smoothr)
library(rgdal)
library(rgeos)
library(maptools)
library(ggmap)
library(scales)
library(ggeasy)
library("wesanderson")
library(spdep)
library(caret)
library(ckanr)
library(FNN)
library(grid)
library(ggcorrplot)
library(corrr)
library(kableExtra)
library(jtools)
library(ggstance)
library(ggpubr)
library(broom.mixed)
library(tab)
library(sjPlot)
library(plotly)
library(stargazer)
library(pscl)
library(plotROC)
library(pROC)
library(lubridate)
col <- wes_palette("Zissou1")
col5 <- colorRampPalette(col)(5)
col6 <- colorRampPalette(col)(6)
col7 <- colorRampPalette(col)(7)
col8 <- colorRampPalette(col)(8)
col10 <- colorRampPalette(col)(10)
col22 <- colorRampPalette(col)(22)
col29 <- colorRampPalette(col)(29)
col96 <- colorRampPalette(col)(96)
pal <- c("#88969e","#465a65","#242e47","#9f6c5b","#f7c2a0")
pal10 <- colorRampPalette(pal)(10)
pal5 <- colorRampPalette(pal)(5)
pal8 <- colorRampPalette(pal)(8)
options(scipen = 999)
options(tigris_class = "sf")
root.dir = "https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/DATA/"
source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")
# Aidan WD
#setwd("C:/Users/aidan/OneDrive/Documents/GitHub/508-Final")
# Chris WD
setwd("C:/Users/Chris Michael/Documents/GitHub/508-Final")
# Census API Key
#tidycensus::census_api_key("e6618c9a8d61015169c4b02abbfc2a06afefde08", overwrite = TRUE)
We started creating the markdown by installing the required packages and loading them into the library. Some of the mainly used packages are tidyverse, tidycensus, and dplyr. The final deliverable for this homework is an RMarkdown file with the required plots and tables, knitted to a HTML document.
The city of Mesa is located in Arizona, and has a population of approximately 520,000 residents. The city is currently facing a public health emergency- in 2021 alone, Mesa saw over 2000 cases of heroin overdose, affecting predominantly young adults. The following sections aim to visualize the data, look at the density and distribution of heroin overdoses throughout the city, and correlate them to various risk factors.
MesaHero <- st_read(paste0(getwd(),"/Fire_and_Medical_Opioid_Overdose_Incidents.csv"))
Mesa_Boundary <- st_read(paste0(getwd(),"/City Boundary.geojson")) %>%
st_transform('ESRI:102648')
Mesa_tracts <- st_read(paste0(getwd(),"/Mesa Census Tracts To City Boundary.geojson"))%>%
select(geoid, geometry)%>%
st_transform('ESRI:102648')
Overdose_Data <- MesaHero %>%
arrange(desc(Longitude))%>%
slice(2:7456)%>%
na.omit()%>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326, agr = "constant")%>%
st_transform('ESRI:102648')
The maps below show the reported case of heroin overdoses in the city of Mesa between 2017-2022. From the density map, we can see that there is a high rate of incidences of heroin overdose to the west of the city of Mesa (denoted in light brown), as well as when we move to the center of the city.
A <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#f8f5ec")+
geom_sf(data = Overdose_Data, colour="#9f6c5b", size=0.8) +
labs(title= "Reported Cases of Heroin Overdoses",
subtitle = "City of Mesa, 2017-2022\n",
caption="Source: Mesa Data Portal")+
theme_void()
B <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#88969e") +
stat_density2d(data = data.frame(st_coordinates(Overdose_Data)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon',show.legend = FALSE) +
scale_fill_gradientn(colors=pal) +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title= "Density of Heroin Overdoses",
subtitle = "City of Mesa, 2017-2022\n",
caption="Source: Mesa Data Portal")+
theme_void()
ggarrange(A,B, align="v")
When exploring the racial and income context of Mesa, we can see that in the West and central areas of the city there is a higher concentration of racial minorities with lower median household incomes. As seen previously, these demographic and socio-economic characteristics generally overlap with areas with counts of heroin overdoses suggesting a potential relationship.
#For reference:
# ACS <- load_variables(2019, "acs5", cache = TRUE)
# Maricopa_Tot_Pop <- get_acs(geography = "county", table = c("B03001"),
# year = 2019, state=04, county=013, geometry=T)
# Mar Tot Pop = 4328810
# Education acs table "B15002"
#RACE
Maricopa_Race_1 <- get_acs(geography = "tract", table = c("B03002"),
year = 2019, state=04, county=013, geometry=T)%>%
st_transform('ESRI:102648')%>%
dplyr::select(variable, estimate, GEOID) %>%
spread(variable, estimate) %>%
rename(geoid = GEOID,
Total = B03002_001,
NHWhite = B03002_003,
NHBlack = B03002_004,
NHAIAN = B03002_005,
NHAsian = B03002_006,
NHNHPI = B03002_007,
Hispanic = B03002_012) %>%
mutate(pctWhite = NHWhite / Total,
pctBlack = NHBlack / Total,
pctHisp = Hispanic / Total,
raceContext = ifelse(pctWhite > .5, "Majority White", "Majority Non-White"))%>%
.[Mesa_tracts,]
Mesa_Race_1 <- merge(Maricopa_Race_1,st_drop_geometry(Mesa_tracts))
#MED HH INC
Maricopa_HH_Med_Inc_1 <- get_acs(geography = "tract", variables = ("B25119_001"),
year = 2019, state=04, county=013, geometry=T)%>%
st_transform('ESRI:102648')%>%
dplyr::select(variable, estimate, GEOID) %>%
spread(variable, estimate) %>%
rename(geoid = GEOID,
HH_Med_Inc = B25119_001)%>%
mutate(incContext = ifelse(HH_Med_Inc > 80000, "High Median Income", "Low Median Income")) %>%
.[Mesa_tracts,]
Mesa_HH_Med_Inc_1 <- merge(Maricopa_HH_Med_Inc_1,st_drop_geometry(Mesa_tracts))
# MAP
C <- ggplot() +
geom_sf(data = Mesa_tracts)+
geom_sf(data = Mesa_Race_1, aes(fill = pctWhite),color="white")+
scale_fill_gradient(low = "#f4f1e9", high = "#465a65",name="White Pop\n%")+
labs(title= "Total White Population (%)",
subtitle = "City of Mesa, 2019\n",
caption="Source: American Community Survey (ACS)")+
theme_void()
D <- ggplot() +
geom_sf(data = Mesa_tracts)+
geom_sf(data = Mesa_HH_Med_Inc_1, aes(fill = HH_Med_Inc),color="white")+
scale_fill_gradient(low = "#f4f1e9", high = "#252f48",name="Household\nIncome")+
labs(title= "Median Household Income",
subtitle = "City of Mesa, 2019\n",
caption="Source: American Community Survey (ACS)")+
theme_void()
ggarrange(C,D, align = "v")
Looking more closely at fixed amenities, park centroids appear to share broad spatial patterns to past reported cases of overdoses. Perhaps more interestingly, there appears to be a higher concentration of heroin overdoses surrounding the city’s light rail stations.
#PARKS
Parks_Data <- st_read(paste0(getwd(),"/Parks_Locations_And_Amenities.csv"))
Parks_Data$Latitude <- as.numeric(Parks_Data$Latitude)
Parks_Data$Longitude <- as.numeric(Parks_Data$Longitude)
Parks_Data_1 <- Parks_Data %>%
filter(Latitude <"33.5",
Longitude <"-111.95")%>%
na.omit()%>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326, agr = "constant")%>%
st_transform('ESRI:102648')
#RAIL STATIONS
Stations_Data <-st_read(paste0(getwd(),"/Light_Rail_Ridership.csv")) %>%
select(Station, Location.1)%>%
group_by(Station)%>%
summarise(Location.1 = max(Location.1))
Stations_Data$Location.1 <- substr(Stations_Data$Location.1,8,25)
Stations_Data_1 <- Stations_Data %>%
mutate(Longitude = substr(Location.1,0,9),
Latitude = substr(Location.1,10,18))%>%
select(-Location.1)%>%
na.omit()%>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326, agr = "constant")%>%
st_transform('ESRI:102648')
#MAPPING GRID ARRANGE
E <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#f8f5ec")+
geom_sf(data = Parks_Data_1, colour="#053a3c", size=0.9, shape=10) +
labs(title= "Parks and Green Spaces",
subtitle = "City of Mesa, 2021\n",
caption="Source: Mesa Data Portal")+
theme_void()
G <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#f8f5ec")+
geom_sf(data = Stations_Data_1, colour="#8a4b0d", size=1.5) +
labs(title= "Light Rail Stations",
subtitle = "City of Mesa, 2021\n",
caption="Source: Mesa Data Portal")+
theme_void()
ggarrange(E,G, align = "v")
The below maps showcase various incidents of crime between 2017 and 2022, more specifically looking at aggravated assaults, motor vehicle thefts, burglaries, and attempted manslaughter. While the total counts differ, the relative density of each of these types of crimes are similar, all heavily concentrated in the West and central neighborhoods of Mesa. Again, given the shared spatial patterns with past incidents of heroin overdose, we can begin to hypothesize a relationship between these risk factors and areas of high risk for overdose.
#ALL CRIME
Crime_Data <- st_read(paste0(getwd(),"/Crime_Reporting_Statistics_-_Uniform_Crime_Reporting__UCR_.csv"))
Crime_Data$Latitude <- as.numeric(Crime_Data$Latitude)
Crime_Data$Longitude <- as.numeric(Crime_Data$Longitude)
#ASSAULT
Assault_Data <- Crime_Data %>%
filter(Crime.Type == "Aggravated Assault" ) %>%
na.omit() %>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326, agr = "constant")%>%
st_transform('ESRI:102648')
#MOTOR VEHICLE THEFT
Vehicle_Theft_Data <- Crime_Data%>%
filter(Crime.Type == "Motor Vehicle Theft" ) %>%
na.omit() %>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326, agr = "constant")%>%
st_transform('ESRI:102648')
#BURGLARY
Burglary_Data <- Crime_Data %>%
filter(Crime.Type == "Burglary") %>%
na.omit() %>%
filter(Longitude >"-111.55")%>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326, agr = "constant")%>%
st_transform('ESRI:102648')
#MANSLAUGHTER
Murder_Data <- Crime_Data %>%
filter(Crime.Type == "Murder/Non-negligent manslaughter") %>%
na.omit() %>%
filter(Longitude >"-111.55")%>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326, agr = "constant")%>%
st_transform('ESRI:102648')
## MAPPING GRID ARRANGE
H <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#f8f5ec")+
geom_sf(data = Assault_Data, colour="#242e47", size=0.7, show.legend = "point") +
labs(title= "Aggravated Assaults",
subtitle = "City of Mesa, 2011-2019\n",
caption="Source: Mesa Data Portal")+
theme_void()
I <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#88969e") +
stat_density2d(data = data.frame(st_coordinates(Assault_Data)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradientn(colors=pal) +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title= "Density of Aggravated Assaults",
subtitle = "City of Mesa, 2011-2019\n",
caption="Source: Mesa Data Portal")+
theme_void()+theme(legend.position = "none")
ggarrange(H,I, align = "v")
J <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#f8f5ec")+
geom_sf(data = Vehicle_Theft_Data, colour="#a1aeb6", size=0.7) +
labs(title= "Motor Vehicle Thefts",
subtitle = "City of Mesa, 2011-2019\n",
caption="Source: Mesa Data Portal")+
theme_void()
K <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#88969e") +
stat_density2d(data = data.frame(st_coordinates(Vehicle_Theft_Data)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradientn(colors=pal) +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title= "Density of Motor Vehicle Thefts",
subtitle = "City of Mesa, 2011-2019\n",
caption="Source: Mesa Data Portal")+
theme_void()+theme(legend.position = "none")
ggarrange(J,K, align = "v")
L <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#f8f5ec")+
geom_sf(data = Burglary_Data, colour="#465a65", size=0.7) +
labs(title= "Burglaries",
subtitle = "City of Mesa, 2011-2019\n",
caption="Source: Mesa Data Portal")+
theme_void()
M <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#88969e") +
stat_density2d(data = data.frame(st_coordinates(Burglary_Data)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradientn(colors=pal) +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title= "Density of Burglaries",
subtitle = "City of Mesa, 2011-2019\n",
caption="Source: Mesa Data Portal")+
theme_void()+theme(legend.position = "none")
ggarrange(L,M, align = "v")
R <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#f8f5ec")+
geom_sf(data = Murder_Data, colour="#242e47", size=0.7) +
labs(title= "Attempted Manslaughter",
subtitle = "City of Mesa, 2011-2019\n",
caption="Source: Mesa Data Portal")+
theme_void()
S <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#88969e") +
stat_density2d(data = data.frame(st_coordinates(Murder_Data)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradientn(colors=pal) +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title= "Density of Attempted Manslaughters",
subtitle = "City of Mesa, 2011-2019\n",
caption="Source: Mesa Data Portal")+
theme_void()+theme(legend.position = "none")
ggarrange(R,S, align = "v")
The last set of data this study took into account were counts of observed homelessness between 2017 and 2022, and the localization of outgoing EMS calls in 2021. Unsurprisingly, there is a strong visual parallel between the homeless population and the light rail metro line. As observed across the nation, transit stations continue to act as a many source of shelter for the unhoused. EMS calls are scattered across the city with significantly fewer in the south most neighborhoods.
#HOMELESSNESS
Homeless_Data <- st_read(paste0(getwd(),"/Unsheltered Point in Time (PIT) Count Phoenix Metro Area.geojson")) %>%
filter(city == "Mesa",
latitude <33.6,
longitude <"-111.95")%>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326, agr = "constant")%>%
st_transform('ESRI:102648')
#EMS CALLS
EMS_Data <- st_read(paste0(getwd(),"/EMS_Calls_1.xlsx")) %>%
na.omit() %>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326, agr = "constant")%>%
st_transform('ESRI:102648')
#MAPPING
N <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#f8f5ec")+
geom_sf(data = Homeless_Data, colour="#9f6c5b", size=0.7) +
labs(title= "Counts of Homelessness",
subtitle = "City of Mesa, 2017-2022\n",
caption="Source: Mesa Data Portal")+
theme_void()
O <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#88969e") +
stat_density2d(data = data.frame(st_coordinates(Homeless_Data)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradientn(colors=pal) +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title= "Density of Counts of Homelessness",
subtitle = "City of Mesa, 2017-2022\n",
caption="Source: Mesa Data Portal")+
theme_void()+theme(legend.position = "none")
ggarrange(N,O, align = "v")
P <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#f8f5ec")+
geom_sf(data = EMS_Data, colour="#c9a38b", size=0.7) +
labs(title= "EMS Calls",
subtitle = "City of Mesa, 2021\n",
caption="Source: Mesa Data Portal")+
theme_void()
Q <- ggplot() +
geom_sf(data = Mesa_Boundary, size=0.5, fill="#88969e") +
stat_density2d(data = data.frame(st_coordinates(EMS_Data)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradientn(colors=pal) +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title= "Density of EMS Calls",
subtitle = "City of Mesa, 2021\n",
caption="Source: Mesa Data Portal")+
theme_void()+theme(legend.position = "none")
ggarrange(P,Q, align = "v")
To support building the risk prevention model described above, this study chose to aggregate past accounts of heroin overdose into a fishnet style grid to further assess the spatial relationship with additional re-configured variables.
fishnet <-
st_make_grid(Mesa_Boundary,
cellsize = 1000,
square = TRUE) %>%
.[Mesa_Boundary] %>% # fast way to select intersecting polygons
st_sf() %>%
mutate(uniqueID = 1:n())
#OVERDOSE
Overdose <-
dplyr::select(Overdose_Data) %>%
mutate(countOverdose = 1) %>%
aggregate(., fishnet, sum) %>%
mutate(countOverdose = replace_na(countOverdose, 0),
uniqueID = 1:n(),
cvID = sample(round(nrow(fishnet) / 24),
size=nrow(fishnet), replace = TRUE))
ggplot() +
geom_sf(data = Overdose, aes(fill = countOverdose), color = NA) +
scale_fill_gradientn(colors=pal8, name="Heroin Overdoses per Cell") +
labs(title = "Count of Heroin Overdose Incidents for the Fishnet",
subtitle="City of Mesa, 2017-2022\n",
caption="\nSource: Mesa Data Portal")+
theme_void()
The maps below shows the clustering of the risk factors for each fishnet cell. This is later useful for aggregation by k nearest neighbor and other spatial functions. This row of maps shows the density of spatial features.
#RACE
Maricopa_Race <- get_acs(geography = "tract", table = c("B03002"),
year = 2019, state=04, county=013, geometry=T)%>%
dplyr::select(variable, estimate, GEOID) %>%
spread(variable, estimate) %>%
rename(geoid = GEOID,
Total = B03002_001,
NHWhite = B03002_003,
NHBlack = B03002_004,
NHAIAN = B03002_005,
NHAsian = B03002_006,
NHNHPI = B03002_007,
Hispanic = B03002_012) %>%
mutate(pctWhite = NHWhite / Total,
pctBlack = NHBlack / Total,
pctHisp = Hispanic / Total,
raceContext = ifelse(pctWhite > .5, "Majority White", "Majority Non-White"))%>%
select(pctWhite, raceContext)%>%
st_transform(st_crs(fishnet))%>%
mutate(Legend = 'Race')
Mesa_Race <- merge(Maricopa_Race,st_drop_geometry(Mesa_tracts))
#MED HH INC
Maricopa_HH_Med_Inc <- get_acs(geography = "tract", variables = ("B25119_001"),
year = 2019, state=04, county=013, geometry=T)%>%
dplyr::select(variable, estimate, GEOID) %>%
spread(variable, estimate) %>%
rename(geoid = GEOID,
HH_Med_Inc = B25119_001)%>%
mutate(incContext = ifelse(HH_Med_Inc > 80000, "High Median Income", "Low Median Income")) %>%
select(incContext) %>%
st_transform(st_crs(fishnet))%>%
mutate(Legend = 'Med HH Inc')
Mesa_HH_Med_Inc <- merge(Maricopa_HH_Med_Inc,st_drop_geometry(Mesa_tracts))
#PARKS
Parks_Data <- Parks_Data %>%
filter(Latitude <"33.5",
Longitude <"-111.95")%>%
select(X = Longitude, Y = Latitude)%>%
na.omit()%>%
st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
st_transform(st_crs(fishnet))%>%
mutate(Legend = 'Parks')
#RAIL STATIONS
Stations_Data <- Stations_Data %>%
mutate(Longitude = substr(Location.1,0,9),
Latitude = substr(Location.1,10,18))%>%
select(-Location.1)%>%
select(X = Longitude, Y = Latitude)%>%
na.omit()%>%
st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
st_transform(st_crs(fishnet))%>%
mutate(Legend = 'Stations')
#ASSAULT
Assault_Data <- Crime_Data %>%
filter(Crime.Type == "Aggravated Assault" ) %>%
select(X = Longitude, Y = Latitude)%>%
na.omit() %>%
st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
st_transform(st_crs(fishnet))%>%
mutate(Legend = 'Assault')
#MOTOR VEHICLE THEFT
Vehicle_Theft_Data <- Crime_Data%>%
filter(Crime.Type == "Motor Vehicle Theft" ) %>%
select(X = Longitude, Y = Latitude)%>%
na.omit() %>%
st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
st_transform(st_crs(fishnet))%>%
mutate(Legend = "Vehicle Theft")
#BURGLARY
Burglary_Data <- Crime_Data %>%
filter(Crime.Type == "Burglary") %>%
select(X = Longitude, Y = Latitude)%>%
na.omit() %>%
filter(X >"-111.55")%>%
st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
st_transform(st_crs(fishnet))%>%
mutate(Legend = "Burglary")
#MANSLAUGHTER
Murder_Data <- Crime_Data %>%
filter(Crime.Type == "Murder/Non-negligent manslaughter") %>%
select(X = Longitude, Y = Latitude)%>%
na.omit() %>%
filter(X >"-111.55")%>%
st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
st_transform(st_crs(fishnet))%>%
mutate(Legend = "Murder")
#HOMELESSNESS
Homeless_Data <- st_read(paste0(getwd(),"/Unsheltered Point in Time (PIT) Count Phoenix Metro Area.geojson")) %>%
filter(city == "Mesa",
latitude <33.6,
longitude <"-111.95")%>%
select(X = longitude, Y = latitude)%>%
na.omit() %>%
st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
st_transform(st_crs(fishnet))%>%
mutate(Legend = "Homeless Count")%>%
select(-X, -Y)
col_order <- c("geometry", "Legend")
Homeless_Data <- Homeless_Data[,col_order]
#EMS CALLS
EMS_Data <- st_read(paste0(getwd(),"/EMS_Calls_1.xlsx")) %>%
select(X = Longitude, Y = Latitude)%>%
na.omit() %>%
st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
st_transform(st_crs(fishnet)) %>%
mutate(Legend = "EMS Calls")
st_c <- st_coordinates
st_coid <- st_centroid
vars_net <-
rbind(EMS_Data, Homeless_Data, Burglary_Data, Vehicle_Theft_Data, Assault_Data, Stations_Data, Parks_Data, Murder_Data) %>%
st_join(fishnet, join=st_within) %>%
st_drop_geometry() %>%
group_by(uniqueID, Legend) %>%
summarize(count = n()) %>%
left_join(fishnet, ., by = "uniqueID") %>%
spread(Legend, count, fill=0) %>%
dplyr::select(-`<NA>`) %>%
ungroup()
vars_net.long <-
gather(vars_net, Variable, value, -geometry, -uniqueID)
vars <- unique(vars_net.long$Variable)
mapList <- list()
for(i in vars)
mapList[[i]] <-
ggplot() +
geom_sf(data = filter(vars_net.long, Variable == i), aes(fill=value), colour=NA) +
scale_fill_gradientn(colors=pal, name="") +
labs(title=i) +
theme_void()
do.call(grid.arrange,c(mapList, nrow=4))
In this section, the k-nearest-neighbor function is used to find the spatial distribution of the risk factors (from the previous section), to calculate the average distance from each grid cell to the closest 3 incidents of each of the various spatial features previously mentioned. These rows of heat density maps help visualize the average distance to the different risk factors.
#Calculating variables k nearest neighbors
st_c<- st_coordinates
st_coid <- st_centroid
vars_net <- vars_net %>%
mutate(Parks.nn = nn_function(st_c(st_coid(vars_net)),
st_c(Parks_Data),
k = 3),
Stations.nn = nn_function(st_c(st_coid(vars_net)),
st_c(Stations_Data),
k = 3),
Homeless.nn = nn_function(st_c(st_coid(vars_net)),
st_c(Homeless_Data),
k = 3),
EMS.nn = nn_function(st_c(st_coid(vars_net)),
st_c(EMS_Data),
k = 3),
Burglary.nn = nn_function(st_c(st_coid(vars_net)),
st_c(Burglary_Data),
k = 3),
Assault.nn = nn_function(st_c(st_coid(vars_net)),
st_c(Assault_Data),
k = 3),
Vehicle_Theft.nn = nn_function(st_c(st_coid(vars_net)),
st_c(Vehicle_Theft_Data),
k = 3),
Murder.nn = nn_function(st_c(st_coid(vars_net)),
st_c(Murder_Data),
k = 3))
#VISUALIZING FISHNETS
vars_net1 <- vars_net %>%
select(-c(2:9))
vars_net.long.nn <-
gather(vars_net1, Variable, value, -geometry, -uniqueID)
vars <- unique(vars_net.long.nn$Variable)
mapList <- list()
for(i in vars)
mapList[[i]] <-
ggplot() +
geom_sf(data = filter(vars_net.long.nn, Variable == i), aes(fill=value), colour=NA) +
scale_fill_gradientn(colors=pal, name="") +
labs(title=i) +
theme_void()
do.call(grid.arrange,c(mapList, nrow=4))
In the following section, we can see that the Moran’s I statistic is near 20% in the West and some more central areas of Mesa which suggests some moderate positive spatial autocorrelation, in other words clustering of opioid overdoses. Furthermore, the associated p-value for this geography appears to be smaller than 0.25 suggesting that our model errors cluster more than what we might expect due to random chance alone, again implying some spatial correlation in our heroin overdose data. The overdose ‘hot-spots’ fishnet further reaffirms this observation.
#drop geom
final_net <-
left_join(Overdose, st_drop_geometry(vars_net), by="uniqueID")
final_net <-
st_centroid(final_net) %>%
st_join(dplyr::select(Mesa_tracts, geoid), by = "uniqueID") %>%
st_drop_geometry() %>%
left_join(dplyr::select(final_net, geometry, uniqueID)) %>%
st_sf() %>%
na.omit()
## make polygon to tracts
final_net.nb <- poly2nb(as_Spatial(final_net), queen=TRUE)
## ... and tracts to list of weights
final_net.weights <- nb2listw(final_net.nb, style="W", zero.policy=TRUE)
## see local moran
local_morans <- localmoran(final_net$countOverdose, final_net.weights, zero.policy=TRUE) %>%
as.data.frame()
# join local Moran's I results to fishnet
final_net.localMorans <-
cbind(local_morans, as.data.frame(final_net)) %>%
st_sf() %>%
dplyr::select(Overdose_Count = countOverdose,
Local_Morans_I = Ii,
P_Value = `Pr(z != E(Ii))`) %>%
mutate(Significant_Hotspots = ifelse(P_Value <= 0.001, 1, 0)) %>%
gather(Variable, Value, -geometry)
#plot
vars <- unique(final_net.localMorans$Variable)
varList <- list()
for(i in vars){
varList[[i]] <-
ggplot() +
geom_sf(data = filter(final_net.localMorans, Variable == i),
aes(fill = Value), colour=NA) +
scale_fill_gradientn(colors=pal) +
labs(title=i) +
theme_void()}
do.call(ggarrange,c(varList))
Once again by using the nearest neighbor feature, we can create a fishnet grid which displays the distance to the significant hot spots for heroin overdose showcased previously.
final_net <- final_net %>%
mutate(overdose.isSig =
ifelse(local_morans[,5] <= 0.001, 1, 0)) %>%
mutate(overdose.isSig.dist =
nn_function(st_c(st_coid(final_net)),
st_c(st_coid(filter(final_net,
overdose.isSig == 1))),
k = 1))
ggplot() +
geom_sf(data = final_net, aes(fill=overdose.isSig.dist), colour=NA) +
scale_fill_gradientn(colors=pal10, name="NN Distance") +
labs(title = "Heroin Overdoses (NN Distance)",
subtitle="City of Mesa, 2017-2022\n",
caption="\nSource: Mesa Data Portal")+
theme_void()
The series of graphs shown below indicate the Number of Heroin Overdoses as a function of Risk Factors. The graphs indicate a negative correlation between distance to all spatial risk factors and the dependent variable- occurrences of heroin overdoses. In other words, the closer you are an incident of crime, to a station, to an outgoing EMS call, etc., there is a higher chance of a heroin overdose will take place.
#correlation
correlation.long <-
st_drop_geometry(final_net) %>%
dplyr::select(-uniqueID, -cvID, -geoid, -overdose.isSig, -overdose.isSig.dist) %>%
gather(Variable, Value, -countOverdose)
correlation.long$Value <- as.numeric(correlation.long$Value)
correlation.cor <-
correlation.long %>%
group_by(Variable) %>%
summarize(correlation = cor(Value, countOverdose))
#plot
ggplot(correlation.long, aes(Value, countOverdose)) +
geom_point(size = 0.8) +
geom_text(data = correlation.cor, aes(label = paste("r =", round(correlation, 2))),
x=-Inf, y=Inf, vjust = 1.5, hjust = -.1) +
geom_smooth(method = "lm", se = FALSE, colour = "black") +
facet_wrap(~Variable, ncol = 2, scales = "free") +
theme_minimal()
The histogram below indicates distribution of occurrences of the dependent variable - counts of heroin overdoses in the city - suggesting a decent amount of sprawl with some levels of clustering.
ggplot(Overdose, aes(countOverdose)) +
geom_histogram(binwidth = 4, color = '#242e47', fill="#465a65") +
labs(title = "Distribution of Heroin Overdoses in Mesa, AZ",
subtitle="City of Mesa, 2017-2022\n",
caption="\nSource: Mesa Data Portal")+
theme_minimal()
As hinted to at the beginning of the analysis, this study chose to predict areas of risk of heroin overdose by creating a Poisson regression. By using cross validation techniques, we can test the generalizability and accuracy of our model. The first method entails holding out data from one unit area and training the model on the remaining areas. We then test our model by predicting for the single held out area and comparing our results with actual observed occurrences of heroin overdose. The second technique is coined ‘Leave-One-Group-Out’ Cross-Validation (LOGO-CV) and operates similarly to the above. Instead, however, this method holds out each census tract one by one to test the generalizability of the model. We present two models to compare, both which account for all original spatial features previously mentioned. The second model, however, considers the distance to spatial features in addition to significant hot spots.
The series of histograms below present the distribution of error of our two models when testing the geographic generalizability with both the random k-fold and LOGO-CV methods. Overall, we can see the model is significantly improved by adding spatial features (distance to the hot-spots) given there is a stronger clustering of error towards the zero mark line. Furthermore, our model generalizes better across smaller spatial units (i.e grid cell) than across whole tracts. This is further validated by the lowest Mean Absolute Error being 0.98.
#include only distance to spatial features
reg.vars <- c("Parks.nn", "Stations.nn",
"Homeless.nn", "EMS.nn", "Burglary.nn", "Assault.nn", "Vehicle_Theft.nn")
#with distance to hotspots added
reg.ss.vars <- c("Parks.nn", "Stations.nn",
"Homeless.nn", "EMS.nn", "Burglary.nn", "Assault.nn", "Vehicle_Theft.nn", "overdose.isSig", "overdose.isSig.dist")
#Poisson Regressions:
##Hold out 1 unit area (cvID) - K-fold
reg.cv <- crossValidate(
dataset = final_net,
id = "cvID",
dependentVariable = "countOverdose",
indVariables = reg.vars) %>%
dplyr::select(cvID = cvID, countOverdose, Prediction, geometry)
reg.ss.cv <- crossValidate(
dataset = final_net,
id = "cvID",
dependentVariable = "countOverdose",
indVariables = reg.ss.vars) %>%
dplyr::select(cvID = cvID, countOverdose, Prediction, geometry)
##Hold out one tract - LOGO VC
reg.spatialCV <- crossValidate(
dataset = final_net,
id = "geoid",
dependentVariable = "countOverdose",
indVariables = reg.vars) %>%
dplyr::select(cvID = geoid, countOverdose, Prediction, geometry)
reg.ss.spatialCV <- crossValidate(
dataset = final_net,
id = "geoid",
dependentVariable = "countOverdose",
indVariables = reg.ss.vars) %>%
dplyr::select(cvID = geoid, countOverdose, Prediction, geometry)
### II. Model Errors: Difference between Predicted and Observed
#summary
reg.summary <-
rbind(
mutate(reg.cv, Error = Prediction - countOverdose,
Regression = "Random k-fold CV: Just Risk Factors"),
mutate(reg.ss.cv, Error = Prediction - countOverdose,
Regression = "Random k-fold CV: Spatial Process"),
mutate(reg.spatialCV, Error = Prediction - countOverdose,
Regression = "Spatial LOGO-CV: Just Risk Factors"),
mutate(reg.ss.spatialCV, Error = Prediction - countOverdose,
Regression = "Spatial LOGO-CV: Spatial Process")) %>%
st_sf()
#plot
error_by_reg_and_fold <-
reg.summary %>%
group_by(Regression, cvID) %>%
summarize(Mean_Error = mean(Prediction - countOverdose, na.rm = T),
MAE = mean(abs(Mean_Error), na.rm = T),
SD_MAE = mean(abs(Mean_Error), na.rm = T)) %>%
ungroup()
error_by_reg_and_fold %>%
ggplot(aes(MAE)) +
geom_histogram(bins = 50, color = '#242e47', fill="#465a65") +
facet_wrap(~Regression) +
geom_vline(xintercept = 0) + scale_x_continuous(breaks = seq(0, 8, by = 1)) +
labs(title="Distribution of MAE", subtitle = "k-fold cross validation vs. LOGO-CV",
x="Mean Absolute Error", y="Count") +
theme_minimal()
#table
st_drop_geometry(error_by_reg_and_fold) %>%
group_by(Regression) %>%
summarize(Mean_MAE = round(mean(MAE), 2),
SD_MAE = round(sd(MAE), 2)) %>%
kable() %>%
kable_styling(full_width = F)
| Regression | Mean_MAE | SD_MAE |
|---|---|---|
| Random k-fold CV: Just Risk Factors | 0.91 | 0.90 |
| Random k-fold CV: Spatial Process | 0.91 | 0.89 |
| Spatial LOGO-CV: Just Risk Factors | 1.51 | 1.96 |
| Spatial LOGO-CV: Spatial Process | 1.50 | 2.07 |
The maps below model present the mean absolute error of our two models when using the Random K-Fold and LOGO Spatial Cross Validation techniques. Given the granularity of each spatial unit (i.e grid cell), it is difficult to visually assess the generalizability our of our model. However, when using the later CV technique in the second set of maps, we can see that our model predicts risk levels of heroin overdose with higher error in the more Western neighborhoods of the city.
error_by_reg_and_fold %>%
ggplot() +
geom_sf(aes(fill = MAE)) +
facet_wrap(~Regression) +
scale_fill_gradientn(colors=pal10) +
theme_void() + theme(legend.position="bottom")
This section uses demographic data for the City of Mesa for the year 2019, to understand how well the model predicts incidences of heroin overdoses for areas that are majority white and majority non-white. Looking at the below table, we can see that our two models predict incidents of overdose risk with smaller error in areas with majority white residents suggesting concerning biases in our model and/or chosen data that needs to be addressed.
reg.summary %>%
st_centroid() %>%
st_join(Mesa_Race_1) %>%
na.omit() %>%
st_drop_geometry() %>%
group_by(Regression, raceContext) %>%
summarize(mean.Error = mean(Error, na.rm = T)) %>%
spread(raceContext, mean.Error) %>%
kable(caption = "Mean Error by Tract Racial Context", digits = 2) %>%
kable_styling(full_width = F)
| Regression | Majority Non-White | Majority White |
|---|---|---|
| Random k-fold CV: Just Risk Factors | -0.71 | 0.13 |
| Random k-fold CV: Spatial Process | -0.69 | 0.12 |
| Spatial LOGO-CV: Just Risk Factors | -0.73 | 0.14 |
| Spatial LOGO-CV: Spatial Process | -0.70 | 0.12 |
Here, our study utilizes median household income data for the City of Mesa for the year 2019, to understand how well the model predicts incidences of heroin overdoses for areas with high or low income. Interestingly, the table suggests that our two models predict overdose risk with smaller error in census tracts with lower income rates.
reg.summary %>%
st_centroid() %>%
st_join(Mesa_HH_Med_Inc_1) %>%
na.omit() %>%
st_drop_geometry() %>%
group_by(Regression, incContext) %>%
summarize(mean.Error = mean(Error, na.rm = T)) %>%
spread(incContext, mean.Error) %>%
kable(caption = "Mean Error by Tract Income Context", digits = 2) %>%
kable_styling(full_width = F)
| Regression | High Median Income | Low Median Income |
|---|---|---|
| Random k-fold CV: Just Risk Factors | 0.26 | -0.16 |
| Random k-fold CV: Spatial Process | 0.17 | -0.11 |
| Spatial LOGO-CV: Just Risk Factors | 0.26 | -0.15 |
| Spatial LOGO-CV: Spatial Process | 0.18 | -0.11 |
To later compare our model with a kernel density model that flattens out incidences, kernel density maps are plotted at 3 different search radii. It can be observed that in each case, there seems to be two primary hotspots for incidences of heroin overdose.
# demo of kernel width
dose_ppp <- as.ppp(st_coordinates(Overdose_Data), W = st_bbox(final_net))
dose_KD.1000 <- spatstat.core::density.ppp(dose_ppp, 1000)
dose_KD.1500 <- spatstat.core::density.ppp(dose_ppp, 1500)
dose_KD.2000 <- spatstat.core::density.ppp(dose_ppp, 2000)
dose_KD.df <- rbind(
mutate(data.frame(rasterToPoints(mask(raster(dose_KD.1000), as(Mesa_tracts, 'Spatial')))), Legend = "1000 Ft."),
mutate(data.frame(rasterToPoints(mask(raster(dose_KD.1500), as(Mesa_tracts, 'Spatial')))), Legend = "1500 Ft."),
mutate(data.frame(rasterToPoints(mask(raster(dose_KD.2000), as(Mesa_tracts, 'Spatial')))), Legend = "2000 Ft."))
dose_KD.df$Legend <- factor(dose_KD.df$Legend, levels = c("1000 Ft.", "1500 Ft.", "2000 Ft."))
ggplot(data=dose_KD.df, aes(x=x, y=y)) +
geom_raster(aes(fill=layer)) +
facet_wrap(~Legend) +
coord_sf(crs=st_crs(final_net)) +
scale_fill_gradientn(colors=pal10, name="Density") +
labs(title = "Kernel density with 3 different search radii\n\n") +
theme_void()
The map below overlays the kernel density model with fishnet cells to show clustering of heroin overdose incidence hotspots.
as.data.frame(dose_KD.1000) %>%
st_as_sf(coords = c("x", "y"), crs = st_crs(final_net)) %>%
aggregate(., final_net, mean) %>%
ggplot() +
geom_sf(aes(fill=value)) +
geom_sf(data = sample_n(Overdose_Data, 1500), size = .1) +
scale_fill_gradientn(colors=pal10, name="Density") +
labs(title = "Kernel Density of of 2017-2022 Overdoses",
subtitle="City of Mesa, 2017-2022\n",
caption="Source: Mesa Data Portal")+
theme_void()
In this section, the maps below compare the kernel density model to the FleetPulse risk prediction model. For the purpose of this study, we chose to split our findings into 5 main risk levels for heroin overdose with 5 being areas with the highest risk. It can be observed that there are some overlap in primary hot-spots between our model’s predictions and the kernel density results. However, the latter seems to over simplify the occurrences of incidents of heroin overdose by potentially focusing too heavily on the granularity of the data and, therefor, under predicting for high risk zones. While our risk prediction model may appears to over predict high risk areas, doing the contrary could be detrimental to the lives of many Mesa residents caught in the opiod crisis.
dose_KDE_sum <- as.data.frame(dose_KD.1000) %>%
st_as_sf(coords = c("x", "y"), crs = st_crs(final_net)) %>%
aggregate(., final_net, mean)
kde_breaks <- classIntervals(dose_KDE_sum$value,
n = 5, "fisher")
dose_KDE_sf <- dose_KDE_sum %>%
mutate(label = "Kernel Density",
Risk_Category = classInt::findCols(kde_breaks),
Risk_Category = case_when(
Risk_Category == 5 ~ "5th",
Risk_Category == 4 ~ "4th",
Risk_Category == 3 ~ "3rd",
Risk_Category == 2 ~ "2nd",
Risk_Category == 1 ~ "1st")) %>%
cbind(
aggregate(
dplyr::select(Overdose_Data) %>% mutate(OverdoseCount = 1), ., sum) %>%
mutate(OverdoseCount = replace_na(OverdoseCount, 0))) %>%
dplyr::select(label, Risk_Category, OverdoseCount)
ml_breaks <- classIntervals(reg.spatialCV$Prediction,
n = 5, "fisher")
dose_risk_sf <-
reg.spatialCV %>%
mutate(label = "Risk Predictions",
Risk_Category =classInt::findCols(ml_breaks),
Risk_Category = case_when(
Risk_Category == 5 ~ "5th",
Risk_Category == 4 ~ "4th",
Risk_Category == 3 ~ "3rd",
Risk_Category == 2 ~ "2nd",
Risk_Category == 1 ~ "1st")) %>%
cbind(
aggregate(
dplyr::select(Overdose_Data) %>% mutate(OverdoseCount = 1), ., sum) %>%
mutate(OverdoseCount = replace_na(OverdoseCount, 0))) %>%
dplyr::select(label,Risk_Category, OverdoseCount)
rbind(dose_KDE_sf, dose_risk_sf) %>%
na.omit() %>%
gather(Variable, Value, -label, -Risk_Category, -geometry) %>%
ggplot() +
geom_sf(aes(fill = Risk_Category), colour = NA) +
geom_sf(data = sample_n(Overdose_Data, 3000), size = 0.75, colour = "black") +
facet_wrap(~label, ) +
scale_fill_manual(values = pal) +
labs(title="Comparison of Kernel Density and Risk Predictions",
subtitle="Heroin Overdose Risk Predictions; 2017-2022 Mesa, AZ\n") +
theme_void() + theme(legend.position="bottom")
Looking at the final chart below, we can confirm that the kernel density maps may heavily overstate high risk areas. However, again, it is uncertain whether our prediction models do the same for lower risk areas.
rbind(dose_KDE_sf, dose_risk_sf) %>%
st_drop_geometry() %>%
na.omit() %>%
gather(Variable, Value, -label, -Risk_Category) %>%
group_by(label, Risk_Category) %>%
summarize(countOverdose = sum(Value)) %>%
ungroup() %>%
group_by(label) %>%
mutate(Pcnt_of_test_set = countOverdose / sum(countOverdose)) %>%
ggplot(aes(Risk_Category,Pcnt_of_test_set)) +
geom_bar(aes(fill=label), position="dodge", stat="identity") +
scale_fill_manual(values = c("#242e47", "#88969e"), name = "Model") +
labs(title = "Risk prediction vs. Kernel density, 2017-2022 Heroin Overdoses",
y = "% of Test Set Overdoses (per model)",
x = "Risk Category") +
theme_minimal()+ theme(legend.position="bottom")
To conclude, this analysis explored the relationship between incidents of heroin overdoses, and therefor high risk areas, in Mesa with a variety of spatial and risk factors such as crime, EMS calls, homelessness, and different amenities. The purpose was to build a Poisson regression model to predict future occurrences of risk of heroin overdose to support the building of our FleetPulse app which aims to improve resource allocation among dedicated heroin overdose prevention fleets. Through Cross Validation techniques as well as by comparing our result to kernel density models, we were able to asses how accurate and generalizable our risk prediction models are.
The best model took into consideration all presented variables and spatial processes, namely the average distance to risk factors, and presented the best results using the k-fold CV method. This implied that our model did better at predicting across smaller scale spatial units, but was not ideal in generalizing across entire tracts. This same FleetPulse model, however predicted more accurately for majority white census tracts, which may suggest some potential bias in recording drug use and overdose data and therefore limits our ability to accurately predict future risk across the whole of Mesa. We can confirm however, that our model is presenting more inclusive predictions compared to other risk representation models such as Kernel Density and therefore could provide some utility for real world purposes.
In the second stage of development, the FleetPulse team hopes to incorporate temporal factors into the model, such as time of day/night, month and weather changes, as well as number of available vehicles and staff. This would ideally improve the utility of our app for the Overdose Prevention fleet organizational and resource allocation purposes. In doing so, these teams of public health respondents will have the ability to adjust their support delivery strategy as environmental and staffing factors change.